plan verification
VerifyLLM: LLM-Based Pre-Execution Task Plan Verification for Robots
Grigorev, Danil S., Kovalev, Alexey K., Panov, Aleksandr I.
In the field of robotics, researchers face a critical challenge in ensuring reliable and efficient task planning. Verifying high-level task plans before execution significantly reduces errors and enhance the overall performance of these systems. In this paper, we propose an architecture for automatically verifying high-level task plans before their execution in simulator or real-world environments. Leveraging Large Language Models (LLMs), our approach consists of two key steps: first, the conversion of natural language instructions into Linear Temporal Logic (LTL), followed by a comprehensive analysis of action sequences. The module uses the reasoning capabilities of the LLM to evaluate logical coherence and identify potential gaps in the plan. Rigorous testing on datasets of varying complexity demonstrates the broad applicability of the module to household tasks. We contribute to improving the reliability and efficiency of task planning and addresses the critical need for robust pre-execution verification in autonomous systems. The code is available at https://verifyllm.github.io.
Planning-Driven Programming: A Large Language Model Programming Workflow
Lei, Chao, Chang, Yanchuan, Lipovetzky, Nir, Ehinger, Krista A.
The strong performance of large language models (LLMs) raises extensive discussion on their application to code generation. Recent research suggests continuous program refinements through visible tests to improve code generation accuracy in LLMs. However, these methods suffer from LLMs' inefficiency and limited reasoning capacity. In this work, we propose an LLM programming workflow (LPW) designed to improve both initial code generation and subsequent refinements within a structured two-phase workflow. Specifically, the solution generation phase formulates a solution plan, which is then verified through visible tests to specify the intended natural language solution. Subsequently, the code implementation phase drafts an initial code according to the solution plan and its verification. If the generated code fails the visible tests, the plan verification serves as the intended solution to consistently inform the refinement process for correcting bugs. Compared to state-of-the-art methods across various existing LLMs, LPW significantly improves the Pass@1 accuracy by up to 16.4% on well-established text-to-code generation benchmarks. LPW also sets new state-of-the-art Pass@1 accuracy, achieving 98.2% on HumanEval, 84.8% on MBPP, 59.3% on LiveCode, 62.6% on APPS, and 34.7% on CodeContest, using GPT-4o as the backbone.
Behnke
Plan-Verification is the task of determining whether a plan is a solution to a given planning problem. Any plan verifier has, apart from showing that verifying plans is possible in practice, a wide range of possible applications. These include mixed-initiative planning, where a user is integrated into the planning process, and local search, e.g., for post-optimising plans or for plan repair. In addition to its practical interest, plan verification is also a problem worth investigating for theoretical reasons. Recent work showed plan verification for hierarchical planning problems to be NP-complete, as opposed to classical planning where it is in P. As such, plan verification for hierarchical planning problem was -- until now -- not possible. We describe the first plan verifier for hierarchical planning. It uses a translation of the problem into a SAT formula. Further we conduct an empirical evaluation, showing that the correct output is produced within acceptable time.
This Is a Solution! (... But Is It Though?) - Verifying Solutions of Hierarchical Planning Problems
Behnke, Gregor (Ulm University) | Höller, Daniel (Ulm University) | Biundo, Susanne (Ulm University)
Plan-Verification is the task of determining whether a plan is a solution to a given planning problem. Any plan verifier has, apart from showing that verifying plans is possible in practice, a wide range of possible applications. These include mixed-initiative planning, where a user is integrated into the planning process, and local search, e.g., for post-optimising plans or for plan repair. In addition to its practical interest, plan verification is also a problem worth investigating for theoretical reasons. Recent work showed plan verification for hierarchical planning problems to be NP-complete, as opposed to classical planning where it is in P. As such, plan verification for hierarchical planning problem was — until now — not possible. We describe the first plan verifier for hierarchical planning. It uses a translation of the problem into a SAT formula. Further we conduct an empirical evaluation, showing that the correct output is produced within acceptable time.
Automaton Plans
Bäckström, C., Jonsson, A., Jonsson, P.
Macros have long been used in planning to represent subsequences of operators. Macros can be used in place of individual operators during search, sometimes reducing the effort required to find a plan to the goal. Another use of macros is to compactly represent long plans. In this paper we introduce a novel solution concept called automaton plans in which plans are represented using hierarchies of automata. Automaton plans can be viewed as an extension of macros that enables parameterization and branching. We provide several examples that illustrate how automaton plans can be useful, both as a compact representation of exponentially long plans and as an alternative to sequential solutions in benchmark domains such as Logistics and Grid. We also compare automaton plans to other compact plan representations from the literature, and find that automaton plans are strictly more expressive than macros, but strictly less expressive than HTNs and certain representations allowing efficient sequential access to the operators of the plan.
Proof System for Plan Verification under 0-Approximation Semantics
In this paper a proof system is developed for plan verification problems $\{X\}c\{Y\}$ and $\{X\}c\{KW p\}$ under 0-approximation semantics for ${\mathcal A}_K$. Here, for a plan $c$, two sets $X,Y$ of fluent literals, and a literal $p$, $\{X\}c\{Y\}$ (resp. $\{X\}c\{KW p\}$) means that all literals of $Y$ become true (resp. $p$ becomes known) after executing $c$ in any initial state in which all literals in $X$ are true.Then, soundness and completeness are proved. The proof system allows verifying plans and generating plans as well.